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Abstract 

Practitioners of Markov chain Monte Carlo (MCMC) may hesitate to use random 
walk Metropolis-Hastings algorithms, especially variable-at-a-time algorithms with 
many parameters, because these algorithms require users to select values of tuning 
parameters (step sizes). These algorithms perform poorly if the step sizes are set to 
be too low or too high. We show in this paper that it is not difficult for an algorithm 

CO to tune these step sizes automatically to obtain a desired acceptance probability, since 

the logit of the acceptance probability is very nearly linear in the log of the step size, 
with known slope coefficient. These ideas work in most applications, including single 
parameter or block moves on the linear, log, or logit scales. We discuss the imple- 

+^ mentation of this algorithm in the software package YADAS. 
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1 Introduction 

Some users of Markov chain Monte Carlo (MCMC) methods in Bayesian statisti- 
cal analyses may be reluctant to implement samplers that rely on random walk 
(T) Metropolis-Hastings moves, because these samplers require users to choose values 

of tuning parameters. The performance of the samplers are sensitive to the choices of 
these parameters. As a result, users may elect to modify the statistical model they use 
to allow them to use the Gibbs sampler with tractable conditional distributions. This 
fear of tuning parameters is unfounded. In the vast majority of examples, it is very 
easy for an algorithm to tune these parameters automatically in an initial burn-in 
phase. This is true because it generally suffices to aim for a given acceptance rate for 
proposed moves, and because the logit of the acceptance rate is almost always linear 
in the step size (proposal standard deviation), with a slope coefficient that can be 
taken to be known. The burn-in phase can feature a designed experiment to estimate 
the intercept of the logistic regression, which can then be used to set the values of the 
tuning parameters for the remainder of the algorithm. This algorithm is implemented 
in YADAS (Graves, 2001, 2003, 2007). 

Interesting cases will involve the need to tune several step sizes simultaneously. 
There is as yet no indication that this should pose any problems. Critically, the 
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approach and the software implementation work not just for single parameter updates 
but also for YADAS's block updates, for positive parameters updated on the log scale, 
and for probability parameters updated on the logit scale. 

In a block update, a Metropolis (-Hastings) move proposes a simultaneous alter- 
ation to multiple parameters, and such updates are very helpful in situations with 
highly correlated parameters. For example, suppose that in a hierarchical model, 
several a priori exchangeable parameters 9i have a common unknown mean //, and 
the remaining parameters are such that the value of \x is uncertain, but the $i are 
all close to /i. (This situation applies in one-way ANOVA with some values for the 
random effect variances; see Graves (2003b), Example 4.) Moving fx and the 0j inde- 
pendently works poorly, because none of these parameters can move quickly due to 
being forced to remain close to the others. However, excellent mixing is possible when 
a naive algorithm is augmented with a step that chooses Z ~ N(0, 1), and proposes 
candidate values Of and ji c according to Of = Qi + sZ for all i and /i c = \i + sZ. 
Choosing a relatively efficient value of s > is the topic of this paper. See Graves, 
Speckman, and Sun (2003), for example, for theory and examples of block updates. 

This algorithm is not a panacea, since many samplers have more serious problems 
than the choice of tuning parameters. However, this algorithm can play a key role in 
problems which only need suitable values of step sizes, even a large number of them. 

Related work 

Gelman, Roberts, and Gilks (1995) work with algorithms consisting of a single Metropo- 
lis move (not variable-at-a-time), and obtain many interesting results for the d- 
dimensional spherical multivariate normal problem with a symmetric proposal dis- 
tributions, including that the optimal scale is approximately 2.4c? -1 / 2 times the scale 
of the target distribution, which implies optimal acceptance rates of 0.44 for d — 1 
and 0.23 for d — > oo. 

Roberts and Rosenthal (2001) evaluate scalings that are optimal (in the sense of 
integrated autocorrelation times) asymptotically in the number of components. They 
find that an acceptance rate of 0.234 is optimal in many random walk Metropolis 
situations, but their studies are also restricted to algorithms that consist of only a 
single step in each iteration so are not directly applicable here, and in any case they 
conclude that acceptance rates between 0.15 and 0.5 do not cost much efficiency. 

Yeung and Wilkinson (2002) model lagged autocorrelation as, for example, a 
quadratic response surface in tuning parameters, and use stochastic search algorithms 
to obtain good values for the tuning parameters. Their focus is on comparing vari- 
ous MCMC algorithms (standard Gibbs sampling vs. block updating, for example) 
more than on demonstrating the efficiency of the tuning methodology. Pasarica and 
Gelman (2004) aim to maximize the expected squared distance between successive 
MCMC samples, since this is equivalent to minimizing first order autocorrelation, 
and use importance sampling estimates of this quantity for several step sizes and 
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numerical optimization. This procedure is promising and includes the case of simul- 
taneously optimizing multiple tuning parameters Its performance is likely to suffer 
with increasing dimensionality of the tuning parameter, but it may be adaptable to 
several individual optimizations instead of a single large one, which would be appro- 
priate for the variable-at-a-time case. 

Andrieu and Thorns (2008) present several important adaptive algorithms that 
tune step sizes and create block updates, among other things. We encourage the 
reader to investigate their work as well. 



2 Philosophy 

Markov chain Monte Carlo (MCMC) algorithms are often too badly flawed to be saved 
by judicious choice of step sizes. This work assumes that they are not and that the 
algorithm contains the right composition of steps. If correlations between parameters 
or multimodality prevent adequate exploration of the posterior distribution regardless 
of the values of tuning parameters, more drastic measures are necessary. See, for 
example, Liu and Rubin (2002), and Andrieu and Thorns (2008). 

We aim to tune step sizes to achieve a desired acceptance rate that is the same 
for every update of every problem (we use 1/e as the target acceptance rate, because 
this gives the false impression that the target is a result of a theoretical study). 
This strategy is almost certainly suboptimal but has worked well in practice for us. 
Sherlock, Fearnhead, and Roberts (2010) also tune various algorithms to attain target 
acceptance rates, and their Algorithm 2 tunes step sizes of univariate updates to attain 
acceptance rates between 0.4 and 0.45. 

Once the goal of tuning step sizes to attain acceptance rates of 1/e (say) is ac- 
cepted, it is surprisingly easy to achieve it. We find that the logit of the acceptance 
rate is very nearly linear in the log of the step size. Consider, for example, the 
case where the posterior distribution f(x) = (27r) -1 / 2 exp(— x 2 /2) is standard nor- 
mal. Given that the current state of the Markov chain is x, we propose a new state 
y ~ N(x, s 2 ) (i.e. the transition proposal density is T(x,y) = (2tts)' 1 ^ 2 exp{ — (y — 
x) 2 /2s 2 }), define R(x,y) = y^'yj ^M, and accept the move to state y with proba- 
bility min{l, R(x, y)}. The long-run acceptance rate is obtained by integrating the 
acceptance probability with respect to the joint distribution of (x,y), namely 

/oo roo 
/ min{l, R(x, y)}T(x, y) dy f(x) dx 
-oo J —oo 

= T(y,x) f(y) dy dx + / T(x,y) f(x) dy dx, 

J J{(x,y):R(x,y)<l} J J {(x,y):R(x,y)>l} 

which, if the proposal distribution is symmetric (if we are using the Metropolis algo- 
rithm, with T(x,y) = T(y,x)), and if it is also continuous, reduces to 



/ / T(x,y) f(x) dy dx. 
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Observing that in the case where x and y\x are both normal, f(x) < f(y) if and only 
if |x| > \y\, this can be rewritten 

where S s is the covariance matrix of (x,y) (En = £ 12 = £21 = 1 and £ 22 = 1 + s 2 ). 
Next change variables to the independent standard normal (u v) T = S i 7 1 / 2 (a; y) T . It 
can be verified directly that 

^ = (2 + 2s + S v( 1 t S ! + 

so that 

{(u,v) : \x\ > \y\} = {(u,v) : |(1 + s)u + v\ > \u + (1 + s + s 2 )v|}. 
Our acceptance rate therefore reduces to 

ill (27r)~ 1 exp(-^(u 2 + v 2 )} dv du 

J J {{u,v):\(l+s)u+v\>\u+(l+s+s 2 )v\} 12 J 

= ±j™f_% s (2 7 r)- 1 exp{-i( M 2 + ,; 2 )} dvdu 



oo I 



4 / ^(T J ^)-^( n a?(2 + 1 



Gelman, Roberts, and Gilks (1995) report, without elaboration, that this acceptance 
rate "can be determined analytically" and equals - arctan(-). One way to check this 
is to substitute t = 2/s, and show that the derivative of this function with respect to 
t is equal to 2 (1 + 1 2 ) -1 . After differentiating, the integration can easily be done in 
closed form, leaving some algebra to be done. 

Plotting the logit of - arctan( 2 ) against log(s) demonstrates the very close approx- 
imate linearity; see Figure [l] Denote this function by pi(s); the subscript 1 refers to 
the standard deviation of the posterior distribution. We have 

logit px(s) ~ 0.76 - 1.121og(s) 

using a least squares fit of the numerically integrated function. Observe further that, 
since clearly p a (s) — pi(s/a), we have 

logit p a (s) ~ 0.76- 1.12 log(s/cr) = (0.76 + 1.12 log a) - 1.12 logs, 

so that the slope of the relationship is nearly equal to the constant —1.12, indepen- 
dently of a. Clearly, this result holds for other means for x, and one expects the 
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Figure 1: Approximate linear relationship of the logit acceptance rate and the log of 
the step size. 



same result to hold where x is a single component of a parameter with a(n approx- 
imately) multivariate normal posterior. Therefore, to find an appropriate step size 
for an approximately normal marginal distribution, one can consider collecting ac- 
ceptance rate data for various step sizes and fitting a logistic regression with known 
slope. The marginal distribution does not need to be very close to normal, either: for 
example, simulations indicate that the slope for an exponential posterior distribution 
is about —1.08, and so is the slope for a t distribution with 2 degrees of freedom. 
These slopes are close enough to —1.12 that it is better to use the fixed slope than to 
try to estimate a slightly different one using data. 

Certainly, one could use the actual arctangent relationship to try to choose a 
good s: in the univariate example, if p is the desired acceptance rate, then we obtain 
s = 2a/ tan(|p), where a is the posterior standard deviation, so we only need to 
estimate a. (Examples of results include s = 2.4a for p = 0.44, s — 3.1a for p — 1/e, 
and s = 5.3a for p = 0.23.) However, in variable-at-a-time random walk Metropolis 
updates, one expects that the proper interpretation of a is not the posterior standard 
deviation but the average conditional standard deviation, which is presumably more 
difficult to estimate from a Metropolis algorithm. 

In our experience, this approach works equally well for positive parameters up- 
dated on the log scale with 

T(x,y) = (2ns)- l / 2 y- l ex P {-(logy-logx) 2 /2s 2 } 
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and with parameters in (0, 1) updated on the logit scale with 

T(x, y) = {2itsy 1/2 [y{l - y)]' 1 exp{-(logit y - logit x) 2 /2s 2 }. 

We essentially always update standard deviation parameters, and, respectively, prob- 
ability parameters in these ways. Our approach also works well if the parameter x is 
a vector of probabilities that sum to one, and if we propose a new value by adding a 
Gaussian perturbation to logit Xi and rescaling the other x/s so that they still sum 
to one (and we have a step like this for each i). 



2.1 Trial stage and logistic regression 

To take advantage of this relationship, we begin our MCMC algorithm with a trial 
stage: the user specifies an initial guess for each proposal standard deviation. The 
trial stage loops through thirteen (for example) logarithmically spaced step sizes fifty 
times for each, and monitors the number of accepted moves for each step size. If one 
then overlooks the fact that the slope in this regression can be taken to be known, 
one can then fit a logistic regression: 

logit (acceptance probability with step size s) = a + Mog(s) 

using Newton-Raphson, and use the estimated parameters to try to hit a target 
acceptance rate s = b~ 1 (logit(p tar get) ~ The Newton-Raphson algorithm works as 
follows. Let and be the estimates of a and b after k iterations. Let rii be 
the number of trials for the ith step size, Xi be the number of accepted moves, and Sj 
be the ith step size. Let pf* = logit" 1 ^^ + b^ logSj) be the estimated acceptance 
probabilities for step size % after k iterations. Further let A^> = X^rijpf^(l — pf^), 

= E i n l (\ogs l )p ( t\l-p ( t ) ), and (?« = n 4 (log Si fpf\l -#>). The Newton- 
Raphson equations are 

a (*+i) = fl (fc) + {A (k) J2( Xl - n^) - ]T(log Sl )(x % - mp^)}/(A^C^ - B^ 2 ) 

i i 

S(*+i) = 6« - {BW ~ n iPf ] ) + C(k) E( lo S - nip^)}/(A^C^ - B^ 2 ). 

i i 



In practice, this algorithm converges quickly (in less than twenty iterations) with 
starting values + b^ = 0, for essentially all the data we have tried. 

This algorithm has been implemented in YADAS in case we find an application 
where the logistic slope differs substantially from -1.12. Most often, though, one 
should fix the slope and estimate the intercept alone. Using the notation above, the 
algorithm then updates the intercept a as follows: 

a (k+l) =a (k) + ^i n i(^-Pi k) ) 
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This algorithm is less numerically stable. It can diverge readily, even when the al- 
gorithm is augmented by step halving when the change to a shrinks the likelihood. 
For this and other reasons, we actually perform the estimate with a "prior" for a in- 
cluded for regularization purposes. A normal prior with mean ji a = —3 and standard 
deviation a a = 5 implies that the optimal step size will normally be between 1CT 5 
and 10 3 , and modifies the updating algorithm for a to 



u a 

One may note that an experiment with thirteen different step sizes may be very 
conservative compared to what is actually required. In principle, since the slope is 
known, if one knows the acceptance rate accurately for one step size, one can use that 
information alone to tune it to a different acceptance rate. However, spending a few 
hundred iterations in this manner has not seemed overly expensive to us, since we 
normally run the algorithm for many more iterations after tuning. 



2.2 Simulation experiment 

Here we discuss a simulation experiment that explores how accurate the initial guesses 
for step sizes need to be, and how many logarithmically spaced step sizes and how 
many attempted steps at each step size are needed. Our experiment assumed that the 
logistic regression model was correct, and that the acceptance rate for step size s is 
logit _1 { — 5.7 — 1.121og(s)}; the constants were chosen so that the "optimal" step size 
is 0.01. The full factorial simulation experiment tried initial guesses of 0.01 x 2 k for 
k = {—7, —6, . . . , 0, . . . , 6, 7}, 3, 5, 7, 9, 11, 13, and 15 different step sizes per run, and 
10,20,30,40, and 50 attempted steps per step size. (For example, suppose that the 
initial guess was 0.16 and the number of step sizes was 9. The step sizes considered 
would be 0.01,0.02,0.04,0.08,0.16,0.32,0.64,1.28, and 2.56. These step sizes yield 
acceptance rates of 0.368, 0.700, 0.903, 0.974, 0.993, 0.998, 1.000, 1.000, and 1.000.) For 
each combination of these variables, we constructed 100 simulated data sets using the 
logistic regression model, estimated a with fixed b = —1.12145 using the Newton- 
Raphson algorithm, recommended a step size based on the estimated a and b, and 
counted how many times out of 100 the recommended step size would generate an 
acceptance rate between 0.25 and 0.45. (For the example with initial guess 0.16 and 
nine step sizes, the number of "successes" with 10, 20, 30, 40, and 50 attempts per 
step size were 73, 88, 84, 92, and 91 respectively.) The simulation results are a little 
erratic to report: the discreteness of the possibilities means that by chance a good 
step size can be chosen although the data are inadequate. For example, if the initial 
guess is 0.01 x 2 7 , we only attempt three step sizes ten times each, and we get zero 
acceptances at each step size (which happens about 92% of the time), the algorithm 
will choose a step size of 0.011, for an acceptance rate of 34%. The designs adjacent 
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to this case all have essentially zero probability of getting an acceptable step size. 
Less severe nonmonotonicities exist as well. 

We assume that the initial guess at the step size is off by a given power of two, and 
report the total trial sample size that yields a success rate (acceptance rate between 
25 and 45%) of at least 95%. If one's initial guess is exactly right, a total sample 
size of 120-180 is adequate, and most efficient is 40 trials at each of three step sizes. 
Not surprisingly, this minimal number of step sizes ceases to be efficient quickly when 
one's initial guess is off. When the initial guess is too large by a factor of two, roughly 
200 total trials are needed, and there are several equally efficient ways of getting there: 
20 each at 9 or 11 levels, 30 each at 7 levels, or 40 at 5 levels. When the initial guess 
is too high by a factor of 4, 8, 16, or 32, 20 trials at each level are appropriate, and 
the number of levels should be 11, 11, 13, and 15 respectively: the number of levels 
needs to be large enough so that at least two step sizes smaller than the optimal are 
attempted. Overestimating the step size by a factor of 64 can be overcome with 30 
trials at each of 15 levels, and an overestimate by a factor of 128 should be avoided. 

The situation is not symmetric when one underestimates the optimal step size: in 
fact, it is better to underestimate it a bit, which is unsurprising since lower step sizes 
imply acceptance rates of closer to 0.5, making each trial more informative. Forty 
trials at each of three levels continues to work well if the initial guess is too low by a 
factor of two or four. Underestimates of factors of 8, 16, and 32 require total sample 
sizes of about 180, 220, and 280 respectively, and any choice 20, 30, or 40 per step 
size is about equally effective. 

Of course, one may desire a lower or higher than 95% probability of getting an 
acceptance rate between 25% and 45%. YADAS uses thirteen levels and sample sizes 
of 50 at each level, and this leads to zero failures in practice even when there are 
hundreds of parameters. YADAS also sends the chosen step sizes to output files. 

3 Implementation 

In this section we discuss the YADAS classes that can be used to tune step sizes. For 
an introduction to YADAS 's software design, please see Graves (2001, 2003, 2007). 

In YADAS, the definition of an algorithm is a collection of objects implementing 
the MCMCUpdate interface. YADAS loops through this collection of updates, calls the 
update () method of each, and each of these attempts to change the values of one 
or more unknown parameters. One complete cycle through the collection of updates 
is one iteration in the MCMC algorithm, and the current values of the parameters 
are then sent to output files. The simplest example of a object implementing the 
MCMCUpdate interface is a parameter. A parameter's update () method loops through 
the components in the parameter, attempting a Gaussian random walk Metropolis 
move to each. Another type of update is the MultipleParameterUpdate, which has 
the capability of proposing a Metropolis or Metropolis-Hastings move that affects 
multiple parameters simultaneously: the proposed move is defined using a Perturber. 
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3.1 UpdateTuner 

To introduce step size tuning to YADAS, we built a class called UpdateTuner that 
itself implements MCMCUpdate and can be inserted into the collection of updates in 
a YADAS algorithm. The UpdateTuner supervises another update, altering its step 
sizes and monitoring its acceptance rates during a trial phase, analyzing the results 
of the trial experiment, and then selecting step sizes for the final phase of the MCMC 
algorithm. To define an UpdateTuner, one specifies: 

• An object implementing the TunableMCMCUpdate interface, which will be de- 
scribed below: this is the update step whose step size we are trying to tune. A 
TunableMCMCParameter is an example of such an object; 

• an array of initial guesses for step sizes, one for each step in the update; 

• an integer defining the number of trial step sizes to use in the experiment; 

• an integer defining the number of attempts per trial step size; and 

• a target acceptance rate (we typically use 1/e). 

The last five of these arguments can be omitted; they have default values of: the step 
sizes assigned to the update, 13, 50, 1, and 1/e. 

3.2 TunableMCMCUpdate 

The TunableMCMCUpdate interface extends the MCMCUpdate interface in the following 
way. It introduces several new methods: 

• getStepSizes () returns a vector of doubles, the values of the step sizes cur- 
rently used by the update. This is actually not used by the UpdateTuner class 
so may disappear. 

• setStepSize has two signatures; one changes a single step size, the other the 
entire vector. 

• acceptances () returns a vector of numbers of acceptances, one for each step 
size. 

• tuneoutputO writes the ultimately selected step sizes to a file with extension 
. tun. 

If a user wants to write a new type of update and wants it to be tunable, the user must 
ensure that the new update includes appropriate definitions for all these methods. 
Important update classes that are tunable are MCMCParameter and, unsurprisingly, 
TunableMultipleParameterUpdate. 
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3.2.1 Additions to MCMCParameter 



We included all the tuning code in the MCMCParameter class itself rather our first in- 
tention, which was to have a TunableMCMCParameter subclass. The TunableMCMCUpdate 
methods do reasonable things. An MCMCParameter includes a vector of unknown 
scalar parameters, each of which has a step size, and these are the step sizes that are 
accessed. 

Defining parameters in a YADAS application with tuning is identical to applica- 
tions without, the only exception being that the step sizes included in the definition 
of a parameter are initial guesses only. 

3.2.2 TunableMultipleParameterUpdate 

In YADAS, one may add steps to an MCMC algorithm that (attempt to) improve 
mixing by moving multiple parameters together. This functionality is centered in 
the MultipleParameterUpdate class, and especially the Perturber interface. A 
MultipleParameterUpdate consists only of an array of parameters and a Perturber. 
The Perturber includes all the specialization, such as the method perturb (), which 
produces proposed new values of the parameters, given their old values, and also 
calculates the ratio of proposal probabilities that appears in the acceptance rate for 
Metropolis-Hastings moves. The perturb () method frequently depends on one or 
more tunable step sizes. A canonical example is the NewAddCommonPerturber. It 
is quite common that a posterior distribution is approximately a function of differ- 
ences of some parameters, so that the posterior is relatively insensitive to trans- 
formations that add a common constant to all those parameters. This is what 
NewAddCommonPerturber does: it samples a random Gaussian Z with some stan- 
dard deviation s, and proposes a Metropolis move in which several parameters are 
incremented by Z. More generally, the parameters can be divided into groups, each 
of which gets its own random Zj and each of which has its own standard deviation 
Sj. In this case we want to tune the s/s. 

All the Perturbers included in the YADAS package are tunable. We did not 
include the tuning capability in the Perturber interface itself (rather it is in the 
TunablePerturber interface) because we didn't want to make it more difficult than 
necessary for users to write new Perturbers, but we will try to ensure that the ones 
we write £1X6 clS usable as possible, and that includes making them tunable. 

YADAS also includes a class ReversibleJumpUpdate which is not yet tunable; 
further study is required before it is clear that tuning acceptance rates is appropriate 
in reversible jump problems. 
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4 Examples 



Finally, in this section, we present some examples of the tuning process in practice. 
First, we work with a normal example with unknown mean and variance and tune 
both step sizes. Second, we work with a one-way ANOVA that has been parameterized 
poorly with resultant poor mixing, fix the mixing with a MultipleParameterUpdate, 
and tune its step size along with the others in the problem. The source code and data 
for these examples are available on the YADAS website yadas.lanl.gov; follow the 
"Download" link and then download the zipped directory of examples. 

4.1 Normal example 

In this problem, Example 10 on the YADAS web site, we have data yi ~ N(fi, a 2 ) 
for i = 1, ...,JV, with priors ji ~ Af(a M ,&^) and a ~ T(a a ,b a ) (according to our 
parameterization, a has prior mean a a b a ). Each iteration in the MCMC algorithm 
has two steps: a Metropolis move in which we propose a Gaussian random walk move 
to fi (i.e. fi' = [i+siZi, where Z\ ~ N(0, 1), and a Metropolis-Hastings move in which 
we propose a lognormal adjustment to a (i.e. a' = aexp(s 2 Z 2 ), where Z 2 ~ A r (0, 1). 
s± and s 2 must be tuned. 

The key piece of code in the tuning application is 

MCMCUpdate[] updatearray = new MCMCUpdate[] 

{ new UpdateTuner(mu, dO.rO'mumss") , 13, 50, 1, Math. exp(-l) ) , 

new UpdateTuner (sigma, dO .r ("sigmamss") , 13, 50, 1, Math. exp(-l) ) }; 

in which we define the algorithm to consist of two update steps as described above, 
and whose step sizes will be tuned. Beginning with the first UpdateTuner, mu is 
the definition of the update step (here, a Gaussian random walk update to fi). The 
expression dO . r ( ' 'mumss ' ' ) defines a vector of initial guesses for step sizes (only one 
here, and this expression gets them from an input file). The 13 refers to the number 
of different step sizes to experiment with, the experiment will attempt 50 moves for 
each step size, the 1 means only one cycle of experimentation, and the last argument 
implies that the step size will be tuned for an acceptance rate of 1/e. The fact that 
o will be updated using Gaussian random walk moves on the log scale has been 
determined elsewhere (sigma was defined to be a MultiplicativeMCMCParameter 
instead of just a MCMCParameter). 

4.2 Badly parameterized one-way ANOVA example 

This example, Example 11 on the YADAS website, shows that the tuning proce- 
dure works even in cases where multiple parameters are updated at once. This is 
a one-way analysis of variance example that is commonly used to illustrate mixing 
difficulties in MCMC algorithms that can be solved with reparameterization. Data 
Uij are normal with means /ij and common standard deviation a for % — 1, 2, . . . , I 
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Figure 2: Trace plots for fii,9,\oga, and log 5 in the one-way ANOVA example. 



and j = 1,2, .. . , rij. Here we have taken the fii to have a N(6,S 2 ) prior, 9 has a 
flat hyperprior, and a and S have Gamma priors. This parameterization works fine 
except when 5 is too small compared to a (the sample sizes also drive what is meant 
by "too small"). In this case, the Hi and 9 have high posterior correlation so that it 
works poorly to update them individually. Many solutions exist including reparame- 
terization or block Gibbs updates, but here we use a MultipleParameterUpdate that 
augments the standard variable-at-a-time Gaussian random walk Metropolis with an 
additional step that proposes adding a common random Gaussian perturbation to 
all the /ij and 9. The posterior distribution is relatively invariant to moves like this, 
at least when 5 is small, as it is in the supplied input files. The code to define the 
update algorithm is as follows: 

MCMCUpdate[] updatearray = new MCMCUpdate [] { 

new UpdateTuner (mu) , new UpdateTuner (theta) , 
new UpdateTuner (sigma) , new UpdateTuner (delta) , 
new UpdateTuner ( new TunableMultipleParameterUpdate 
( new MCMCParameter [] {mu, theta}, 
new NewAddCommonPerturber ( new int [] [] { d2.i(0), dO.i(O) >, 

d0.r("mtmss") ), direc + "mtu") ) }; 

Here we have used the default values for the experimental design descriptors (13 
different trial step sizes for 50 attempts each, and so on). This example successfully 
tunes a total of four scalar parameters that are updated on the linear scale, two 
updated on the log scale, and one multiple parameter update. Trace plots of the 
MCMC iterations are shown in Figure [2] 
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5 Conclusions 



There are many other effective ways of tuning step sizes in MCMC algorithms; we 
do not claim that our method is substantially superior to the alternatives, but it is 
simple, intuitive, and versatile. We readily admit that many problems (for example, 
those with posterior distributions with curved contours or multiple separated modes) 
generally feature MCMC difficulties that cannot be adequately solved by tuning step 
sizes alone. However, our method works well for even a large number of tuning 
parameters and in a variety of posterior distributions. 
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